Search CORE

29 research outputs found

Recommended from our members

Using Prosody and Phonotactics in Arabic Dialect Identiﬁcation

Author: Biadsy Fadi
Hirschberg Julia Bell
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2009
Field of study

While Modern Standard Arabic is the formal spoken and written language of the Arab world, dialects are the major communication mode for everyday life; identifying a speaker’s dialect is thus critical to speech processing tasks such as automatic speech recognition, as well as speaker identification We examine the role of prosodic features (intonation and rhythm) across four Arabic dialects: Gulf, Iraqi, Levantine, and Egyptian, for the purpose of automatic dialect identification We show that prosodic features can significantly improve identification, over a purely phonotactic-based approach, with an identification accuracy of 86.33% for 2m utterances

Columbia University Academic Commons

Recommended from our members

Automatic Dialect and Accent Recognition and its Application to Speech Recognition

Author: Biadsy Fadi
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2011
Field of study

A fundamental challenge for current research on speech science and technology is understanding and modeling individual variation in spoken language. Individuals have their own speaking styles, depending on many factors, such as their dialect and accent as well as their socioeconomic background. These individual differences typically introduce modeling difficulties for large-scale speaker-independent systems designed to process input from any variant of a given language. This dissertation focuses on automatically identifying the dialect or accent of a speaker given a sample of their speech, and demonstrates how such a technology can be employed to improve Automatic Speech Recognition (ASR). In this thesis, we describe a variety of approaches that make use of multiple streams of information in the acoustic signal to build a system that recognizes the regional dialect and accent of a speaker. In particular, we examine frame-based acoustic, phonetic, and phonotactic features, as well as high-level prosodic features, comparing generative and discriminative modeling techniques. We first analyze the effectiveness of approaches to language identification that have been successfully employed by that community, applying them here to dialect identification. We next show how we can improve upon these techniques. Finally, we introduce several novel modeling approaches -- Discriminative Phonotactics and kernel-based methods. We test our best performing approach on four broad Arabic dialects, ten Arabic sub-dialects, American English vs. Indian English accents, American English Southern vs. Non-Southern, American dialects at the state level plus Canada, and three Portuguese dialects. Our experiments demonstrate that our novel approach, which relies on the hypothesis that certain phones are realized differently across dialects, achieves new state-of-the-art performance on most dialect recognition tasks. This approach achieves an Equal Error Rate (EER) of 4% for four broad Arabic dialects, an EER of 6.3% for American vs. Indian English accents, 14.6% for American English Southern vs. Non-Southern dialects, and 7.9% for three Portuguese dialects. Our framework can also be used to automatically extract linguistic knowledge, specifically the context-dependent phonetic cues that may distinguish one dialect form another. We illustrate the efficacy of our approach by demonstrating the correlation of our results with geographical proximity of the various dialects. As a final measure of the utility of our studies, we also show that, it is possible to improve ASR. Employing our dialect identification system prior to ASR to identify the Levantine Arabic dialect in mixed speech of a variety of dialects allows us to optimize the engine's language model and use Levantine-specific acoustic models where appropriate. This procedure improves the Word Error Rate (WER) for Levantine by 4.6% absolute; 9.3% relative. In addition, we demonstrate in this thesis that, using a linguistically-motivated pronunciation modeling approach, we can improve the WER of a state-of-the art ASR system by 2.2% absolute and 11.5% relative WER on Modern Standard Arabic

Columbia University Academic Commons

Recommended from our members

Dialect Recognition Using a Phone-GMM-Supervector-Based SVM Kernel

Author: Biadsy Fadi
Collins Michael
Hirschberg Julia Bell
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2010
Field of study

In this paper, we introduce a new approach to dialect recognition which relies on the hypothesis that certain phones are realized differently across dialects. Given a speaker’s utterance, we first obtain the most likely phone sequence using a phone recognizer. We then extract GMM Supervectors for each phone instance. Using these vectors, we design a kernel function that computes the similarities of phones between pairs of utterances. We employ this kernel to train SVM classifiers that estimate posterior probabilities, used during recognition. Testing our approach on four Arabic dialects from 30s cuts, we compare our performance to five approaches: PRLM; GMM-UBM; our own improved version of GMM-UBM which employs fMLLR adaptation; our recent discriminative phonotactic approach; and a state-of-the-art system: SDC-based GMM-UBM discriminatively trained. Our kernel-based technique outperforms all these previous approaches; the overall EER of our system is 4.9%

Columbia University Academic Commons

An Unsupervised Approach to Biography Production using Wikipedia

Author: Biadsy Fadi
Filatova Elena
Hirschberg Julia Bell
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2008
Field of study

We describe an unsupervised approach to multi-document sentence-extraction based summarization for the task of producing biographies. We utilize Wikipedia to automatically construct a corpus of biographical sentences and TDT4 to construct a corpus of non-biographical sentences. We build a biographical-sentence classiﬁer from these corpora and an SVM regression model for sentence ordering from the Wikipedia corpus. We evaluate our work on the DUC2004 evaluation data and with human judges. Overall, our system signiﬁcantly outperforms all systems that participated in DUC2004, according to the ROUGE-L metric, and is preferred by human subjects

CiteSeerX

Columbia University Academic Commons

Automated Conversion of Impaired Speech in Communication Applications

Author: Biadsy Fadi
Chen Joseph
Feng Gang
Jiang Liyang
Mengibar Pedro Moreno
Rybakov Oleg
Wu Yuexin
Zhang Xia
Publication venue: Technical Disclosure Commons
Publication date: 07/03/2023
Field of study

Voice communication can be difficult for those with impaired or accented speech. When such users communicate with others via applications on their devices, listeners often find it difficult to understand them. This disclosure describes techniques that dynamically process impaired or accented speech and convert it to synthesized canonical speech with permission. Generation of the synthesized speech is performed with low latency as a user speaks, enabling the parties to engage in smooth communication that is unaffected by the speaker’s speech impairment. The listeners receive clear, fluent speech automatically generated by suitably trained models. In addition, users can personalize the operation based on their specific speech impairments. The techniques can be integrated within any messaging, conferencing, or phone calling/ dialer application on any device and can make the applications more accessible to users with impaired speech and enhance the user experience

Technical Disclosure Common